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Applying coupon-collecting theory to 
computer-aided assessments 

C. M. Goldie^, R. Cornish^ and C. L. Robinson^ 



Abstract 

Computer-based tests with randomly generated questions allow a large 
number of different tests to be generated. Given a fixed number of altern- 
atives for each question, the number of tests that need to be generated 
before all possible questions have appeared is surprisingly low. 

AMS subject classification (MSC2010) 60G70, 60K99 



1 Introduction 

The use of computer-based tests in which questions are randomly gener- 
ated in some way provides a means whereby a large number of different 
tests can be generated; many universities currently use such tests as part 
of the student assessment process. In this paper we present findings that 
illustrate that, although the number of different possible tests is high 
and grows very rapidly as the number of alternatives for each question 
increases, the average number of tests that need to be generated before 
all possible questions have appeared at least once i s surprisingly low. W e 
presented preliminary findings along these lines in lCornish et al. 



A computer-based test consists of q questions, each (independently) 
selected at random from a separate bank of a alternatives. Let N q be the 
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number of tests one needs to generate in order to see all the aq questions 
in the q question banks at least once. We are interested in how, for fixed 
a, the random variable N q grows with the number of questions q in the 
test. Typically, a might be 10 — i.e. each question might have a bank of 
10 alternatives — but we shall allow any value of a, and give numerical 
results for a — 20 and a = 5 as well as for a = 10. 



2 Coupon collecting 

In the case 5 = 1, i.e. a one-question test, we re-notate N q as Y, and ob- 
serve that we have an equivalent to the classic coupon-collector problem: 
your favourite cereal has a coupon in each packet, and there are a altern- 
ative types of coupon. Y is the number of packets you have to buy in or- 
der to get at least one coupon of each of t he a types. The coupon-co l lector 



problem has been much studied; see e.g. (jGrimmett and Stirzakerl 12001 . 
p. 55). 

We can write Y as 

Y = Y x + Y 2 + ■ ■ ■ + Y a 

where each Yj, is the number of cereal packets you must buy in order 
to acquire a new type of coupon, when you already have i — 1 types in 
your collection. Thus Y\ = 1, Y% is the number of further packets you 
find you need to gain a second type, and so on. The random variables 
Yi, . . . , Y a are mutually independent. For the distribution of clearly 

x a-k+1 /k-lY' 1 

P(Y k = y) = (y=l,2,.... 

a \ a J 

We say that X ~ Geom(p), or X has a geometric distribution with 

parameter p, if P(X = x) = p(l — J?) x_1 for x = 1, 2, Thus Yfe ~ 

Geom((a — k + l)/a). As the Geom(p) distribution has expectation 1/p 
it follows that 

a a a 

ey = y EY k = y — j — = a y y. 

^ ^ a-k + 1 ^ k 

k=l fc=l fe=l 

For different values of a we therefore have the following. 

a 5 10 15 20 
EY 11-42 29-29 49-77 71-96 

In other words, if there are 10 coupons to collect then an average of 29 
packets of cereal would have to be bought in order to obtain all 10 of 
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these coupons. In the context of computer-based tests, if a test had one 
question selected at random from a bank of 10 alternatives, an average 
of 29 tests would need to be generated in order to see all the questions 
at least once. 

To apply the theory to tests with more than one question we will also 
need an explicit expression for P(Y > y). To revert to the language of 
coupons in cereal packets, let us number the coupon types 1, 2, . . . , a, 
and let Ai be the event that type i does not occur in the first y cereal 
packets bought. The event that Y > y is then the union of the events 
Ai, A2, ■ . . , A a . So by the inclusion-exclusion formula, 



P(Y>y)=P[\J Ai 



\i=l / 

a 

= '£p(A l )-Y,P(A t nA J )+ J2 P(A^A^A k )---- 

i—1 i<j i<j<k 

+ (-i) a+1 p(A 1 n---nA a ). 

Obviously P{Ai) = (1 — l/a) v for each i. For distinct i and j, Ai OAj is 
the event that a particular two of the a coupon types do not occur in the 
first y purchases, so has probability (1 — 2/a) y . Similarly Aj n Aj P\ A^, 
for distinct i, j and k, has probability (1 — 3/a) v , and so on. We conclude 
that 



fc=i 



r(y>y) = £(-i) k+1 (f)(i-l) y (2.1 



(when y > the final term of the sum is zero) . Let F be the distribution 
function for Y; thus the above is equivalent to 

F(y) := P(Y < y) = ^(-l) fe (?) (l - -J (y = a, a + 1, a + 2, . . . ). 

(2.2) 

This is a classical formula for the probability that all cells are oc c upied 



when y balls are distributed at random among a cells; cf. (jFellerl . 1968 
(11.11)). The right-hand side of (j2~2)t has value when y = 0, 1, . . . , 
a-1. 
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We return to the initial question. We have a test containing q questions, 
each selected at random from a bank of a alternatives. N q is defined 
to be the number of tests that need to be generated in order to see all 
possible aq questions at least once. 

For question j of the test, let Yj be the number of tests needed to 
see all the a alternatives in its question bank. The random variables Y\ , 
Y2, . . . , Y q are mutually independent, each distributed as the Y of the 
previous section, and N q is their maximum: 

N q = max{Y 1 ,Y 2 ,...,Y q }. 

We thus have 



EN q = P(N g > n) 

n=0 

00 

= £(l-P(JV,<n)) 



n=0 



E i-IP«< 

n=0 \ j=l 



q 



e i-n( i - p «>»)) 

n=0 \ j=l 



00 



J2(l-(l-P(Y>n)) q ). (3.1) 



n=0 

This can be reduced to a finite sum as follows. 

00 q , s 

^ = EE(- i r +1 (j(^>-)) m 

n=0m=l ^ ' 

= EE(-ir +1 U)E(-i) jl+1 C- 
= -E(:)E-E(-^ 1+ - +j 

m=l V 7 ji = l j m =l 



a . 
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a \ a 

h) \3rr 



n=0 Vt=l 




^ (_ 1)J1 + - +Jm («)... (-) 



.171 l 

m=l x ' .71=1 J m =l 



l-n£i(l-Ji/a) 



This, though, is not well suited to computation, and we have used Q3.1jl 
for the numerical results below. 

Note The way in which CMG got involved in writing this paper was 
through chancing on a query posted by RC on Allstat, a UK-based 
electronic mailing list, asking how to calculate the expected number 
of tests a student would need to access in order to see the complete 
bank of questions. CMG immediately recognised the query as a form 
of coupon-collecting problem, but not quite in standard form. What he 
should have do ne then was t o thin k and calculate, following Littlewood's 
famous advice ( Littlewood . 19861 p. 93) 



"It is of course good policy, and I have often practised it, to begin 
without going too much into the existing literature" . 

What he actually did was to seek previous work using G oogle. With cus- 



tomar y speed and accuracy, Google produced a list with lAdler and Ross 



(|200lh in position 6. Knowing that Sheldon Ross is unbeatable at com- 
binatorial probability problems, CMG looked up this paper — and was 
thoroughly led astray. The paper does indeed treat our problem and is 
an excellent paper, but it is much more general than we needed and sets 
up a structure that obscures the relatively simple nature of what we 
needed for this problem. It was better to work the above out from first 
principles. 



4 Asymptotics 

We employ Extreme- Value Theory (EVT) to investigate the random 
variable N q as the number of questions q becomes large, the number a 
of alternatives per question staying fixed. It turns out we are in a case 
identified by C. W. Anderson in 1970, where a limit fails to exist but 
there are close bounds above and below. Thus despite the absence of a 
limit we gain asymptotic results of some precision. 
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The relevant extreme-value distribution will be the Gumbel distribu- 
tion, with (cumulative) distribution function A(x) = exp(— e~ x ) for all 
x G R; write Z for a random variable with the Gumbel distribution. 

Throughout this section a > 2 is an integer, and we set a := log(a/(a— 
1)) > 0. Proofs of the results in this section are in fj5] 

A first goal of EVT for the random variables N q would be to find a 
norming sequence a q > and a centring sequence b q such that (N q — 
b q )/a q has a limit distribution as q — > oo. 

Theorem 4.1 There do not exist sequences a q > and b q such that 
(N q — b q )/a q has a non-degenerate limit distribution as q — > oo. However, 
with b q := a -1 log(ag) we have for all x € R that 

A(a{x - 1)) = liminf P(N q -b q <x) 

< limsupP(A^ 9 - b q < x) = K(ax). (4.1) 

q— >oo 

Thus N q — b q , in distribution, is asymptotically between aT x Z and 1 + 
oT x Z , with Z Gumbel, and these distributional bounds are sharp. 

To describe the local behaviour, let [x\ denote the integer part of x, 
{x} := x — [x\ the fractional part, and let \x~\ := [x\ +1. Then for each 
integer n, 

P(N q - \b q ] =n)- A(a(n + 1 - {b q })) + A(a(n - {b q })) -> 

as q ->• oo. (4.2) 

We remark that the Gumbel distribution has mean 7 — 0-5772, the 
Euler-Mascheroni constant, and variance ir 2 /6. Its distribution tails de- 
cay exponentially or better: lim, c _ i . 00 e x (l — A(x)) = 1 and lim, c _ i ._ 00 e~ x 
A(x) = 0. We use these facts below. We first extend the above stochastic 
boundedness of the sequence (N q — b q ) to L p -boundedness for all p. For 
the rest of the paper we set b q := a -1 log(ag) and R q := N q — b q . 

Theorem 4.2 For each p > 1, sup 9eN E(\R q \ p ) < 00. 

Theorem 14.21 implies that the distributional asymptotics of Theorem 
14. II will extend to give asy mptotic bounds on moments. Moment conver- 



gence in EVT is treated in ( Resnickl . ll987L §2.1), and we use some of the 



ideas from the proofs there in proving the results below. 
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Theorem 4.3 

7 + log a 



< liminf (EN q - 



< limsup I iiiVo < h 1. 

o-s-oo V a / a 



LSUp 

q— too 

By similar methods one may obtain bounds on higher moments. We 
content ourselves with those on the second moment, leading to good 
bounds on var N q , the variance of N q . 

Lemma 4.4 

E((l + a- 1 Z) 2 l 1+Q - lz<0 + (a-'Zflzyo) < liminf E(R 2 ) 

N — / q— too H 

< limsup E(R 2 q ) < ^((a- 1 Z) 2 l z < + (1 + a~ l Z) 2 l 1+a -i z>Q ) . (4.3) 

q— too 

Theorem 4.5 

limsup variV„ — -—- < 9(a) + 1 — e 
q^oo oar a 



_ t , 2(7 + ^(1)) 



where 9(a) = E((l + a^Zfl 

o<i+q- 1 z<i) satisfies < 9(a) < 1, and 
E^l) = t- x e- 1 dt - 0-2194. 



He re, E\ (1) is a value of the exponential integral (cf. ([Abramowitz and Stegun 
19651 §5.1)) E n (x) = f™t- n e~ xt dt. 



5 Proofs for §3] 

Proof of Theorem \4-l\ In (|2.1[) the k = 1 term dominates for large y, so 

p(y >j/) = a (i-i/ a )»(i + o(i)) (5.i) 



as y — > oo through integer values. As noted in (jAndersonl . Il970l §1), the 
fact that the integer- valued random variable Y has 

P(Y > y) y 



P(Y > y + 1) 



1 



> 1 as y 



prevents it from belonging to the 'domain of attraction' for maxima of 
any extreme-value distribution, and so no non-trivial limit distribution 
for (N q — b q )/a qi for any choices of a q and b q , can exist. 

For the rest of the proof, b q :— a -1 log(ag). Via the definition of a, 
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(|5.ip gives that F(y) = 1 — ae ay (l + o(l)) as y — s- oo through integer 
values. So for each fixed 

P(N q -b q <x) =Fi([x + b g \) = (l~ae- a ^ +b ^(l + o(l))) q (5.2) 

as q — > oo. Then 

P(N q -b q < x) < (1 - ae- a{x+b i\l + o(l))) q 

/ e-~(l + o(l))V a / v 
= 1 — ^ -> A(ax) as g -> oo. 



With id still fixed we define the sequence (g(fc)), j to be those g for 
which the interval (x + 6 g _i, a; + b q ] contains one or more integers, i.e. 
for which x + 6 g _i < \x + b q \ . Since b q —¥ oo this is an infinite sequence, 
and since b q+ \ — b q — > we have x + b q ^ — \x + 6 g (fc)J — >■ as k — > oo, 
whence with (|5.2p we conclude that P(N q rf.\ — 6 g (M < x) — > A(ax) as 
k — > oo. Thus limsup^^ P(N q — b q < x) = A(ax). 
For the limit inferior, 

P{N q -b g <x)>(l- ae- a{ - x - 1+b «\l + o{l))) q 



= r_ e -«(^-i)(i + (i)) y 

— > A(a(x — 1)) as g — > oo. 

With the same sequence (g(fc)) as above, note that x + b q (k)-i ~ l x + 
^(j(fc)-ij — > 1 as fc — > oo, so 

P(iV g(fe) _ 1 - & ffCfc) _ a < x) = (1 - oe- a L*+».w-iJ (1 + o(l))) 9(fcM 

= (1 - ^-"(^(^--^(l + o(l))) 9(fc) " 1 

by (15. 2|) . The right-hand side converges to A(a(x — 1)). Thus liminf^oo 
P(N q - b q < x) = A(a(x - 1)). This establis hes AH). 

The extension to local behaviour is due to lAndersonl (|l980l ). To gain 
the conclusion as we formulate it, (|4.2[) . we may argue directly: fix an 
integer n and start from 

P(N q - \b q -} < n) = Fi(n + [6,1 ) = (l - cuT ^™ (1 + o(l))) 9 

as q — > oo. Now 

-a(«+r6 g ]-6 g ) -a(n+l— {&„}) 

ae -«(«+Kl) = £ = £ 

9 q 
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and as the convergence in (1 — c/q) q — > e~ c is locally uniform in c we 
deduce that 

P{N q ~ \bq \ < n) - A(a{n + 1 - {b q })) -)• as q ->■ oo. 

Subtract from this the corresponding formula with n replaced by n — 1, 
and follows. □ 

For the next result we need a uniform bound on expressions of the 
form 1 — (1 — u/n) n : 

Lemma 5.1 For any uq > there exists a positive integer n\ — n\(uo) 
such that for n > rt\ and < u < Uq, 

l-(l--) <2u. 

Proof There exists to > (its value is about 0-7968) such that log(l — 
t ) = -2t , so log(l - t) > -2t for < t < t Q . Take m > u Q /t , then 
1 — (1— u/n) n < \ — er 2u for n > n\ and < u < it , and as l-e _2u < 2u 
the result follows. □ 



Proof of Theorem \4 -S\ We write It := 1 if statement T is true, It := 
if T is false. Fix n £ N. The distribution of N q is such that E(R q n ) < oo 
for all q. We prove that sup geN E(R q n ) < oo. Now 

E{R 2 q n ) = [ x 2n dP(R q < x) - [ x 2n dP{R q > x), 

J(-oo,0] J(0 : oo) 

and so, on integrating by parts, 

^n-ip^ < x ) dx + 2n / x^-ipiRq > x) dx 

-oo JO 

=:A + B, 

say. 

In (|2.1[) the right-hand side is asymptotic to its first term, ae~ ay . 
There exists yo such that for real y > yo (not just integer y), P(Y > 
y) < 2ae- a{ y- 1 \ So for x > and q > a^e^ , 



P(Y > x + b q ) < 2ae 



-a(x-\-b q — 1) _ gQ-ai' 



q 

and hence 

/ 9 x q 

P(R q >x) = l-(l- P{Y >x + b q )) q < 1 - 1 - - e°" ax 
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Now apply Lemma 00] It follows that there exists q% such that for q > qi 
and x > 0, 

P(Rg >x)< Ae a - ax . 

Therefore, for q > q±, 

/>oo />oo 

B = 2n x 2n_1 P(i2 9 > x) dx < 8n / a! 2n - 1 e a - OM! < oo. 
■/o ./0 

It remains to bound A. Returning again to (|2.1[) . observe that we may 
find yi so that P(Y > y) > \ae~ ay for all real y > y\. Therefore for 
x > yi — b q we have 

P(Y >x + b q )> iae-^+M = ^e~ ax , 

and so 

P(R q < x) = (1 - P(Y > x + b q )) q 

< cxp(-qP(Y > x + b q )) 

< exp(-i e - M ) for x> yi - b q . (5.3) 

In A = —2n f_ x 2n ~ 1 P(R q < x)dx, the lower endpoint of the in- 
terval of integration may be taken to be — b q , as the integrand vanishes 
below this point, and we then choose further to split the integral to 
obtain 

f0 ryi-b q 

A = -2n / x 2n ' 1 P{R q <x)dx-2n J x 2n - l P(R q < x) dx 
Jyi—bg J —b q 

=■ Ai + A 2 , 

say. If we take q so large that b q > y 1: (|5.3[) gives 

Ai<- f X 211 " 1 exp(-l-e- ax ) dx 



< -2n J x 271 - 1 exp(-i e - Q:r ) dx < oo. 
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Finally, 

rVi-b 



= -2n f 1 9 x 2n - 1 F g (x + b q )dx 

Jb q 

= -2n ^ \u-b q ) 2n - 1 F q {u)du 



< 2ny 1 b 2 q n - 1 F\y 1 ) 



= 2ny 1 (^) 2n - 1 F«(y 1 ). 



'log(ag)^ 2 " 1 
a 

This tends to as q — > oo, because < F(yi) < 1. 

We have shown that limsup^^ E{R 2 q n ) < oo, so sup ggN E(R 2 q n ) < oo 
as claimed, and the result follows. □ 

Before proving Theorem 14.31 we note that (|4.1D says that for each 

A(a(x - 1)) = liminf P(R g < x) < lim sup P(R q < x) = A(ax), (5.4) 

and that what we have to prove is 

E{or x Z) < liminf ER q < limsup .Ei?, < E(l + oT x Z). (5.5) 

We use (|5.4p mostly in the form 
P(pT x Z > x) = liminf P(R q > x) 

q-^-oo 

< limsup P{R q > x) = P{\ + or x Z > x), (5.6) 

q— ¥oo 

obtained by subtracting each component from 1. We make much use of 
Fatou's Lemma, that for non- negative f n , 



liminf f n > liminf/„, 

n— >oc / / n— s-oo 

and also of its extended form: that if /„ < / and / is integrable then 
limsup f n < / limsup/„. 

n— >oo J J n— >oc 

The latter may be deduced from the former by considering / — /„. 



Proof of Theorem \4-3\ We use the fact that for a random variable X 
with finite mean, and any constant c, 

/oo 
P{X>x)dx, (5.7) 
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as may be proved by integrating J, . xdP(X < x) by parts. We thus 
have, for c > 0, 

ER q < E(R q lR q> - c ) 

/C [>0O 
P{R q > x) dx + J P{R q > x) dx 

=:A + B + C, 

say. First, by the left-hand equality in (|5.6[) . limsup^^ A = —cP(a Z 
> — c). Second, from the right-hand equality in (|5.6p . and the extended 
Fatou Lemma (take the dominating integrable function to be 1), 

/c poo 
P(l + oT l Z > x)dx < / P(l + a~ 1 Z > x)dx. 
-c J—c 

Combining the bounds on A and B yields 

/oo 
P(l + a~ 1 Z > x)dx 

+ c(P(l + oT x Z > -c) - P{a- x Z > -c)) 
= J E((l + a- 1 Z)l 1+Q - lz> _ c ) 

+ cP(-c- 1 < oT^Z < -c) 
< E({1 + a^Z^+a-iz^c) + cP{ a - x Z < -c). 

For the third upper bound, on C, we note (with an eye to the next 
proof as well) that by Theorem B~2l K := sup geN E(\R q \ 3 ) < oo. Then 
for x > 0, P{R q > x) < K/x 3 , hence C < K/(2c 2 ). On combining this 
bound with that on A + B we gain an upper bound on limsup^^ ER q 
that converges to £^1 + a _1 Z) as c — > oo, concluding the proof of the 
upper bound in (|5.5p . 

For the lower bound we again use (|5.7p . this time to write 

ER q = E(R q l Rq <- c ) + E(R q l Rq> - c ) 

/oo 
P(R q > x) dx 

=:A + B + C, 

say. First, Fatou's Lemma and then the left-hand equality in (|5.6|) give 

/oo />oo 
liminf P(R q > x) dx = I P(a~ l Z > x) dx. 
-c 9^°° J-c 
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Second, 



liminf B = —c lim sup P(i? g > -c) = -cP(l + a X Z > -c), 
this time by the right-hand equality in (|5.6I) . Combining, we find that 

/oo 
P{pT x Z > x) dx 
-c 

- c(P(l + a _1 Z > -c) - P(a~ 1 Z > -c)) 
= E{a- 1 Z\ a -i z> _ c ) - cP{-c - 1 < aT l Z < -c) 
> Pfa^Z) - cPfa^Z < -c). 

Finally, to put a lower bound on A we may again use the 'Markov 
inequality' method used above for C, obtaining A > —K/(2c 2 ). Com- 
bining this with the above, we gain a lower bound on lim inf q ^oo ER q 
that converges to E(a~ 1 Z) as c — > 00. We thus obtain the lower bound 
in ([53| . □ 

Proof of Lemma \4--4\ We use variants of the decompositions in the pre- 
vious proof. First, the upper bound. With c > fixed, 

E(R 2 q ) = E(R 2 q l Rq> _ c ) + E(R 2 q l Rq <_ c ) 

/OO 
xP{R q > x) dx + E(R 2 q l Rq <_ c ) 

= c 2 P(R q > -c) + 2 J xP(R q > x) dx + 2 xP(R q > x) dx 

/OO 
xP{R q > x) dx + E{R 2 l Rq <_ c ) 

=: A + B x + B 2 + C + D, 

say. By the right-hand equality in (|5.6[) . limsup^^ A = c 2 P(l+a~ 1 Z > 
— c). By the left-hand equality and Fatou's Lemma, followed by an in- 
tegration by parts, 

,0 

lim sup B x < 2 / xP[oT x Z > x)dx 

q— >oo J—c 

= -c 2 P{a- l Z > -c) + E^a^Zfl^-iz^). 
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Combining, 



limsup(yl + Si) < c z P(-c - 1 < oT l Z < -c) 

1 ry\2 



+ E((a- 1 Z) 2 l_ c<a - lz < ) 
< (?P(aT x Z < -c) + S((a- 1 Z) 2 l z <o). (5.8) 

Next, by the right-hand equality in (|5.6[) . and the extended Fatou 
Lemma, 

lim sup B 2 <2 xP(l + a~ 1 Z > x) dx 

q— s-oo Jo 

/>oo 

< 2 / xP(l + a~ 1 Z > x)dx 
Jo 

= E((l + a- 1 Z) 2 l 1+a - lz>a ). 
On combining this with (|5.8p and letting c — > oo we conclude that 

lim limsup(A + B 1 + B 2 ) 

< E^a^Zflz^ + (1 + a- l Zfl 1+a - lz>Q ). 



The upper bound in (|4.3[) will follow if we can show that lim c _s. 0O 
limsup^^C = 0, and likewise for D. For C this follows by insert- 
ing into its defining formula the bound P(R q > x) < K/x 3 developed 
in the proofs above, while for D it follows from Theorem 14.21 via the 
uniform integrability of the family (R 2 ) qe fi. The upper bound in (14.3[) is 
proved. 

For the lower bound we fix c > and write 
E(R 2 q ) > E(R 2 q l Rq> _ c ) 

xP(R q > x) dx 

-C 

> c 2 P{R q > -c) + 2 J xP{R q > x) dx + 2 J xP{R q > x) dx. 

In this right-hand side, use the left-hand equality in (I5.6[) on the first 
term, use the right-hand equality and the extended Fatou Lemma on 
the second term, and use the left-hand equality and Fatou's Lemma on 
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the third term, to give 

,o 

limM E(R 2 a ) > c 2 P(a- 1 Z > -c) + 2 / xP(l + oT x Z > x) dx 

q^oo « J_ c 

+ 2 [ xP{oT x Z > x)dx. 
Jo 

By two integrations by parts this becomes 
liminf E(R 2 ) > c 2 P(a~ 1 Z > -c) - c 2 P(l + a" 1 Z > -c) 

q—^oo ^ 

+ E((l + a- 1 Z) 2 l_ c<1+a - lz < ) 

+ c 2 P(a- 1 Z > c) + E^a^Zfla^z^) 

= -c 2 P(-c - 1 < oT x Z < -c) 

+ £((l + a - 1 Z) 2 l_ c<1+Q - lz < ) 

+ c 2 P(a- 1 Z > c) + £;((a- 1 Z) 2 l 0<Q - lz < c ) 

> -c 2 P{or x Z < -c) + E((l + a- l Z) 2 \_ c<l+a -, z ^) 
+ E((a- 1 Z) 2 l 0<a - lz < c ). 

On letting c->oowe obtain the lower bound in (|4.3[) . □ 

Proof of Theorem \4-5\ By Lemma 14.41 



limsup£;(i? 2 ) 

q~ >oo 



< P((a- 1 Z) 2 l z < ) + E((l + a^Zfl^+^z^) 
+ E((l + a~ 1 Z) 2 l z>0 ) 

= Etta^Z) 2 ) + 9(a) + P{Z > 0) + -E(Z1 Z>0 ). (5.9) 

Now R q = N q -b q , so variV, = vari? 9 = E(R 2 ) - (ER q ) 2 . From (J5T5J) we 
have limini g^oo ER q > E(oT x Z) = j/a > 0, so liminf g _ i . 00 (i?i? g ) 2 > 
(Eia^Z)) 2 . With (EU) this gives 

2 

limsupvar 7V 9 < var(a" 1 Z) + 6(a) + P(Z > 0) + -E(Z1 Z>0 ). 

q—^oo OL 

We have varZ = tt 2 /6, while P{Z > 0) = 1 - e" 1 . Also E(Z1 Z>0 ) = 
7 - E(Zl z <o), and 

-E(Z1 Z < ) = f (-z)e- z cxp(-e- z )dz 

r°° r°° e~* 

= J (logt)e-*dt = J — df = £'i(l). 
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The bound 

jf_ , af^ , i .-i , 2 (^ + i? i( 1 )) 
6a 2 



lim sup var N q < — - + 6(a) + 1 — e 



q— >oo 

follows. 

For the lower bound, the lower bound in Lemma 14.41 may be written 



liminf E{R 2 a ) > E((l + ar x Zf\ z<Q 

q^oo y X 

- E((l + a- 1 Z) 2 l < 1+a -iz<i) + S((a- 1 Z) 2 l z>0 ) 
= E^a^Z) 2 ) - 6(a) + P(Z < 0) + lEipT 1 Zl z <o). 

From (|5.5I) we have < limsup^^ ER q < E(\ + oT x Z), so limsup^o, 
(ER q f < 1 + 2E(oT 1 Z) + (E(a- 1 Z)) 2 , which with the above gives 



lim inf var R„ 

q— >oo 

> varia^Z) - 9(a) - 1 + P(Z < 0) - 2a" 1 (EZ - E(Z1 Z < )) . 
Thus, since var N q = var R q , 

l iminf var N >lL. 6(a) 1 + e- 1 - 2(7 + ^ l(1)) , 
g^oo 6a^ a 

which is the required lower bound on lim inf q ^oo var N q and completes 
the proof. □ 



6 Numerical results 

Matlab and Pascal were used to evaluate EN q for different values of a 
and q. Fig. 16.11 shows values for EN q for different values of a for tests 
with up to 20 questions. For example, for a test with 10 alternatives for 
each question EN q ranges from 29 when there is one question in the 
test to only 56 when there are 20 questions. Contrast this with the total 
number of possible tests, which increases from 10 to 10 20 in this range. 

These results led the authors to extend the investigation to consider 
tests containing up to 200 questions. Fig. 16.21 demonstrates that, as the 
number of questions in a test is increased, the average number of tests 
required in order for all possible questions to have appeared increases 
quite slowly. In a 200-question test with 10 alternatives for each question, 
there are 10 200 different possible tests and a total bank of 2000 questions; 
however, on average all questions will have appeared at least once by the 
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Figure 6.1 EN q , the expected number of tests that need to be gen- 
erated in order for all questions to have appeared at least once, for 
tests with up to 20 questions and 5, 10, and 20 alternatives for each 
question. 



time only 78 tests have been generated. Table IBTTl summarises the results 
from Fig. 16.21 giving EN q for different values of a and q. 



Table 6.1 Values of EN q for various values of a and q. 



Number of alternatives 




Number of 


questions 


in test 


(?) 




for each question(a) 


1 


5 


10 


20 


50 


100 


200 


5 


11-4 


17-8 


20-8 


23-8 


27-9 


310 


34-1 


10 


29-3 


43-5 


49-9 


56-4 


65-0 


71-6 


78-1 


20 


72-0 


102-0 


115-3 


128-7 


146-5 


1600 


173-5 



7 Discussion 

The asymptotics concern the behaviour of the random variable N q , 
defined in as the number of questions, q, grows. There is also de- 
pendence on a, the number of alternative answers per question in the 
multiple choice, but we regard a as fixed; it is any integer at least 2, and 
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25 50 75 100 125 150 175 200 



q, number of questions 

Figure 6.2 EN q , the average number of tests that need to be gen- 
erated in order for all questions to have appeared at least once, for 
tests with up to 200 questions and 5, 10, and 20 alternatives for each 
question. 



we set 

a:=l0g (^l)' 

so a > 0. Theorem 14 . 1 1 first says that N q cannot be centred and normed 
so that its distribution properly converges (one could get convergence to 
0, of course, just by heavy norming). However it then says that by cent- 
ring (translation) alone, N q comes very close to looking like the random 
variable Z/a, where Z has the Gumbel distribution. The difference is a 
'wobble' of between and 1 in the limit; persistence of discreteness is 
responsible for this. 

Theorem 14.31 establishes that the expected value of N q behaves ac- 
cordingly, growing like a -1 \ogq. More exactly, after centring by b q := 
a -1 log(aq) it differs from oT x EZ by a number between and 1 in the 
limit. Table [73] gives values of b q + oT x EZ for different values of a and 
q. For q > 20 the actual values of EN q in the previous table exceed these 
by 0-5-0-6, exactly as Theorem 14.31 predicts. 

What about the variance of N q as q grows? Theorem 14.51 savs that it 
does not tend to infinity, but is trapped as q — > oo between bounds that 
do not depend on q. The precision is pleasing, given that N q does not 
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Table 7.1 Values 


ofb q ^ 


- a 1 EZ for 


various 


values 


of a ana 


>q. 


Number of alternatives 




Number of questions in test (q) 




for each question (a) 


l 


5 10 


20 


50 


100 


200 


5 


9-8 


170 20 1 


23-2 


27-3 


30-4 


33-5 


10 


27-3 


42-6 49-2 


55-8 


64-5 


71-0 


77-6 


20 


68-7 


101-0 114-5 


128-1 


145-9 


159-4 


173-0 



converge, in any sense. The asymptotic bounds on the variance of N q , 

2 

var N q , are ± A where A is a strange jumble of constants: 

A = fl(a) + l-e-V^ + El(1) ) 

a 

(the bounds are not claimed to be sharp). 



Table 7.2 Asymptotic bounds on the standard 
deviation of N q . 



a 


2 


3 


4 


5 


10 


20 


Min s.d. 
Max s.d. 


0-641 
2-537 


2- 323 

3- 823 


3-697 
5-107 


5- 024 

6- 390 


11- 507 

12- 804 


24- 362 

25- 630 



The amount of variability can be better appreciated through the 
standard deviation. The asymptotic bounds on the standard deviation 
of N q are 

and some values for these are in Table 17.21 The lower bound is non- 
trivial, i.e. positive, in each case. 
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